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Reactive point processes (RPPs) are a new statistical model de¬ 
signed for predicting discrete events in time based on past history. 

RPPs were developed to handle an important problem within the do¬ 
main of electrical grid reliability: short-term prediction of electrical 
grid failures (“manhole events”), including outages, fires, explosions 
and smoking manholes, which can cause threats to public safety and 
reliability of electrical service in cities. RPPs incorporate self-exciting, 
self-regulating and saturating components. The self-excitement oc¬ 
curs as a result of a past event, which causes a temporary rise in 
vulner ability to future events. The self-regulation occurs as a re¬ 
sult of an external inspection which temporarily lowers vulnerability 
to future events. RPPs can saturate when too many events or in¬ 
spections occur close together, which ensures that the probability of 
an event stays within a realistic range. Two of the operational chal¬ 
lenges for power companies are (i) making continuous-time failure 
predictions, and (ii) cost/benefit analysis for decision making and 
proactive maintenance. RPPs are naturally suited for handling both 
of these challenges. We use the model to predict power-grid failures in 
Manhattan over a short-term horizon, and to provide a cost/benefit 
analysis of different proactive maintenance programs. 

1. Introduction. We present a new statistical model for predicting dis¬ 
crete events over time, called Reactive Point Processes (RPPs). RPPs are 
a natural fit for many different domains, and their development was moti¬ 
vated by the problem of predicting serious events (fires, explosions, power 
failures) in the underground electrical grid of New York City (NYC). In New 
York City and in other major urban centers, power-grid reliability is a major 
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source of concern, as demand for electrical power is expected to soon exceed 
the amount we are able to deliver with our current infrastructure [DOE 
(2008), Rhodes (2013), NYBC (2010)]. Many American electrical grids are 
massive and have been built gradually since the time of Thomas Edison in 
the 1880s. For instance, in Manhattan alone, there are over 21,216 miles 
of underground cable, which is almost enough cable to wrap once around 
the earth. Manhattan’s power distribution system is the oldest in the world, 
and NYC’s power utility company. Con Edison, has cable databases that 
started in the 1880s. Within the last decade, in order to handle increasing 
demands on NYC’s power-grid and increasing threats to public safety. Con 
Edison has developed and deployed various proactive programs and policies 
[So (2004)]. In Manhattan, there are approximately 53,000 access points 
to the underground electrical grid, which are called electrical service struc¬ 
tures or manholes. Problems in the underground distribution network are 
manifested as problems within manholes, such as underground burnouts or 
serious events. A multi-year, ongoing collaboration to predict these events in 
advance was started in 2007 [Rudin et al. (2010, 2012, 2014)], where diverse 
historical data were used to predict manhole events over a long-term hori¬ 
zon, as the data were not originally processed enough to predict events in 
the short term. Being able to predict manhole events accurately in the short 
term could immediately lead to reduced risks to public safety and increased 
reliability of electrical service. The data from this collaboration have suf¬ 
ficiently matured due to iterations of the knowledge discovery process and 
maturation of the Con Edison inspections program, and, in this paper, we 
show that it is indeed possible to predict manhole events to some extent 
within the short term. 

The fact that RPPs are a generative model allows them to be used for 
cost-benefit analysis, and thus for policy decisions. In particular, since we 
can use RPPs to simulate power failures into the future, we can also simulate 
various inspection policies that the power company might implement. This 
way we can create a robust simulation setup for evaluating the relative costs 
of different inspection policies for NYC. This type of cost-benefit analysis can 
quantify the cost of the inspections program as it relates to the forecasted 
number of manhole events. 

RPPs capture several important properties of power failures on the grid: 

• There is an instantaneous rise in vulnerability to future serious events 
immediately following an occurrence of a past serious event, and the vul¬ 
nerability gradually fades back to the baseline level. This is a type of 
self-exciting property. 

• There is an instantaneous decrease in vulnerability due to an inspection, 
repair or other action taken. The effect of this inspection fades gradually 
over time. This is a self-regulating property. 
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• The cumulative effect of events or inspections can saturate, ensuring that 
vulnerability levels never stray too far beyond their baseline level. This 
captures diminishing returns of many events or inspections in a row. 

• The baseline level can be altered if there is at least one past event. 

• Vulnerability between similar entities should be similar. RPPs can be 
incorporated into a Bayesian framework that shares information across 
observably similar entities. 

RPPs extend self-exciting point processes (SEPPs), which have only the 
self-exciting property mentioned above. Self-exciting processes date back 
at least to the 1960s [Bartlett (1963), Kerstan (1964)]. The applicability of 
self-exciting point processes for modeling and analyzing time-series data has 
stimulated interest in diverse disciplines, including seismology [Ogata (1988, 
1998)], criminology [Porter and White (2012), Mohler et al. (2011), Eges- 
dal et al. (2010), Lewis et al. (2010), Louie, Masaki and Allenby (2010)], 
finance [Chehrazi and Weber (2011), Ai’t-Sahalia, Cacho-Diaz and Laeven 
(2010), Bacry et al. (2013), Filimonov and Sornette (2012), Embrechts, Lin- 
iger and Lin (2011), Hardiman, Bercot and Bouchaud (2013)], computa¬ 
tional neuroscience [Johnson (1996), Krumin, Reutsky and Shoham (2010)], 
genome sequencing [Reynaud-Bouret and Schbath (2010)] and social net¬ 
works [Crane and Sornette (2008), Mitchell and Cates (2010), Simma and 
Jordan (2010), Masuda et al. (2012), Du et al. (2013)]. These models appear 
in so many different domains because they are a natural fit for time-series 
data where one would like to predict discrete events in time, and where the 
occurrence of a past event gives a temporary boost to the probability of 
an event in the future. A recent work on Bayesian modeling for dependent 
point processes is that of Guttorp and Thorarinsdottir (2012). Paralleling 
the development of frequentist literature, many Bayesian approaches are mo¬ 
tivated by data on natural events. Peruggia and Santner (1996), for example, 
develop a Bayesian framework for the Epidemic-Type-Aftershock-Sequences 
(ETAS) model. Nonparametric Bayesian approaches for modeling data from 
nonhomogeneous point pattern data have also been developed [see Taddy 
and Kottas (2012), e.g.]. Blundell, Beck and Heller (2012) present a non¬ 
parametric Bayesian approach that uses Hawkes models for relational data. 
An expanded related work section appears in the supplementary material 
[Ertekin, Rudin and McCormick (2015)]. 

The self-regulating property can be thought of as the effect of an in¬ 
spection. Inspections are made according to a predetermined policy of an 
external source, which may be deterministic or random. In the application 
that self-exciting point processes are the most well known for, namely, earth¬ 
quake modeling, it is not possible to take an action to preemptively reduce 
the risk of an earthquake; however, in other applications it is clearly possi¬ 
ble to do so. In our power failure application, power companies can perform 
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preemptive inspections and repairs in order to decrease electrical grid vul¬ 
nerability. In neuroscience, it is possible to take an action to temporarily 
reduce the firing rate of a neuron. There are many actions that police can 
take to temporarily reduce crime in an area (e.g., temporary increased pa¬ 
trolling or monitoring). In medical applications, doses of medicine can be 
preemptively applied to reduce the probability of a cardiac arrest or other 
event. Alternatively, for instance, the self-regulation can come as a result of 
the patient’s lab tests or visits to a physician. 

Another way that RPPs expand upon SEPPs is that they allow devi¬ 
ations from the baseline vulnerability level to saturate. Even if there are 
repeated events or inspections in a short period of time, the vulnerability 
level still stays within a realistic range. In the original self-exciting point 
process model, it is possible for the self-excitation to escalate to the point 
where the probability of an event gets very close to one, which is generally 
unrealistic. In RPPs, the saturation function prevents this from happening. 
Also, if many inspections are done in a row, the vulnerability level does not 
drop to zero, and there are diminishing returns for the later ones because of 
the saturation function. 

Outline of paper. We motivate RPPs using the power-grid application in 
Section 2. We first introduce the general form of the RPP model in Section 3. 
We discuss a Bayesian framework for fitting RPPs in Section 4. The Bayesian 
formulation, which we implement using Approximate Bayesian Computa¬ 
tion (ABC), allows us to share information across observably similar entities 
(manholes in our case). For both methods we fit the model to NYC data and 
performed simulation studies. Section 5 contains a prediction experiment, 
demonstrating the RPPs’ ability to predict future events in NYC. Once the 
RPP model is fit to data from the past, it can be used for simulation. In 
particular, we can simulate various inspection policies for the Manhattan 
grid and examine the costs associated with each of them in order to choose 
the best inspection policy. Section 6 shows this type of simulation using the 
RPP, illustrating how it is able to help choose between different inspection 
policies, and thus assist with broader policy decisions for the NYC inspec¬ 
tions program. The paper’s supplementary material [Ertekin, Rudin and 
McCormick (2015)] includes a related work section, conditional frequency 
estimator (CF estimator) for the RPP, experiments with a maximum like¬ 
lihood approach, a description of the inspection policy used in Section 6 
and simulation studies for validating the fitting techniques for the models 
in the paper. It also includes a description and link for a publicly available 
simulated data set that we generated, based on statistical properties of the 
Manhattan data set. 

A short version of this paper appeared in the late-breaking developments 
track of AAAI-13 [Ertekin, Rudin and McCormick (2013)]. 
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FDNY/250 REPORTS F/0 45536 E.51 ST & BEEKMAN PL...MANHOLE FIRE 
MALDONADO REPORTS F/0 45536 E.51 ST FOUND SB-9960012 SMOKING 
HEAVY...ACTIVE...SOLID...ROUND...NO STRAY VOLTAGE...29-L... 

SNOW...FLUSH REQUESTED... ORDERED #100103. 

12/22/09 08:10 MALDONADO REPORTS 3 2WAY-2WAY CRABS COPPERED 
CUT OUT & REPLACED SAME. ALSO STATES 5 WIRE CROSSING COMES U 
P DEAD WILL INVESTIGATE IN SB-9960013. 

FLUSH # 100116 ORDERED FOR SAME 

12/22/09 14:00 REMARKS BELOW WERE ADDED BY 62355 

12/22/09 01:45 MASON REPORTS F/0 4553 E.51ST CLEARED ALL 

B/O-S IN SB9960013 ALSO FOUND A MAIN MISSING FROM THE WEST IN 

12/22/09 14:08 REMARKS BELOW WERE ADDED BY 62355 

SB9960011 F/0 1440 BEEKMAN.JMC 

Fig. 1. Part of the ECS remarks from a manhole fire tieket in 2009. The ticket implies 
that the manhole was actively smoking upon the worker’s arrival. The worker located a 
crab connector that had melted (‘‘coppered”) and a cable that was not carrying current 
(“dead”). Addresses and manhole numbers were changed for the purpose of anonymity. 


2. Description of data. The data used for the project includes records 
from the Emergency Control Systems (ECS) trouble ticket system of Con 
Edison, which includes records of responses to past events (total 213,504 
records for 53,525 manholes from 1995 until 2010). Part of the trouble ticket 
for a manhole fire is in Figure 1. 

Events can include serious problems such as manhole fires or explosions, or 
nonserious events such as wire burnouts. These tickets are heavily processed 
into a structured table, where each record indicates the time, manhole type 
(“service box” or “manhole,” and we refer to both types as manholes collo¬ 
quially), the unique identifier of the manhole and details about the event. 
The trouble tickets are classified automatically as to whether they represent 
events (the kind we would like to predict and prevent) or not (in which case 
the ticket is irrelevant and removed). The processing of tickets is based on a 
study where Con Edison engineers manually labeled tickets, and is discussed 
further by Passonneau et al. (2011). 

We have more or less complete event data from 1999 until the present, and 
incomplete event data between 1995 and 1999. A plot of the total number 
of events per year (using our definition of what constitutes an event) is 
provided in Figure 2(a). 

We also have manhole location and cable record information, which con¬ 
tains information about the underground electrical infrastructure. These two 
large tables are joined together to determine which cables enter into which 
manholes. The inferential join between the two tables required substantial 
processing in order to correctly match cables with manholes. Main cables 
are cables that connect two manholes, as opposed to service or streetlight 
cables which connect to buildings or streetlights. In our studies on long-term 
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Eig. 2. A plot of the number of yearly events, a histogram of the number of Main Phase 
(PH) eables, and a histogram of the age of oldest eable set in a manhole. 


prediction of power failures, we have found that the number of main phase 
cables in a manhole is a relatively useful indicator of whether a manhole is 
likely to have an event. Figure 2(b) contains a histogram of the number of 
main phase cables in a manhole. 

The electrical grid was built gradually over the last ^130 years, and, 
as a result, manholes often contain cables with a range of different ages. 
Figure 2(c) contains a histogram of the age of the oldest main cables in each 
manhole, as recorded in the database. Cable age is also used as a feature 
for our RPP model. Cable ages range from less than a year old to over 100 
years old; Con Edison started keeping records back in the 1880s during the 
time of Thomas Edison. We remark that it is not necessarily true that the 
oldest cables are the ones most in need of replacement. Many cables have 
been functioning for a century and are still functioning reliably. 

We also have data from Con Edison’s new inspections program. Inspec¬ 
tions can be scheduled in advance, according to a schedule determined by 
a state mandate. This mandate currently requires an inspection for each 
structure at least once every 5 years. Con Edison also performs “ad hoc” 
inspections. These occur when a worker is inside a manhole for another pur¬ 
pose (e.g., to connect a new service cable) and chooses to fill in an inspection 
form. The inspections are broken down into 5 distinct types, depending on 
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whether repairs are urgent (Level I) or whether the inspector suggests ma¬ 
jor infrastructure repairs (Level IV) that are placed on a waiting list to be 
completed. Sometimes when continued work is being performed on a single 
manhole, this manhole will have many inspections performed within a rel¬ 
atively small amount of time—hence our need for “diminishing returns” on 
the influence of an inspection that motivates the saturation function of the 
RPP model. 

Some questions of interest to power companies are as follows: 

(i) Can we predict failures continuously in time, and can we model how 
quickly the influence of past events and inspections fade over time? 

(ii) Can we develop a cost/benefit analysis for proactive maintenance 
policies? 

RPPs will help with both of these questions. 


3. The reactive point process model. We begin with a simpler version 
of RPPs where there is only one time-series corresponding to a single en¬ 
tity (manhole). Our data consist of a series of Ne events with event times 
, ^ 2 ,..., and a series of given inspection times denoted by , ^ 2 ,..., twj • 
The inspection times are assumed to be under the control of the exper¬ 
imenter. RPPs model events as being generated from a nonhomogeneous 
Poisson process with intensity A(t) where 


(1) m =Ao 


1 + 51 ( XI - ^e) ] - 53 ( X 


\/te<t 




yti<t 


where are event times and ti are inspection times. The vulnerability level 
permanently goes up by Ci if there is at least one past event, where Ci is a 
constant that can be fitted. The Cil[ 7 V£;>i] term is present to deal with “zero 
inflation,” where the case of zero events needs to be handled separately than 
one or more past events. Functions g 2 and are the self-excitation and self¬ 
regulation functions, which have initially large amplitudes and decay over 
time. Self-exciting point processes have only ^2, and not the other functions, 
which are novel to RPPs. Functions gi and gs are the saturation functions, 
which start out as the identity function and then flatten farther from the 
origin. If the total sum of the excitation terms is large, gi will prevent the 
vulnerability level from increasing too much. Similarly, g^ controls the total 
possible amount of self-regulation and encodes “diminishing returns” for 
having several inspections in a row. 

The RPP model arose based on exploratory work performed using a condi¬ 
tional frequency (CF) estimator of the data. To construct the CF estimator, 
we computed the empirical probability of another event occurring on a day t 
given that an event occurred at t = 0. To obtain these probabilities, we first 
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(a) Empirical probabilities and fitted values 
for the self-excitation function g 2 


x10“^ 



(b) Empirical probabilities and fitted values 
for the saturation function gi 


Eig. 3. Fitted functions for empirical probabilities for the Manhattan data set. These 
figures display results for the conditional frequency estimator used to derive the form of 
the RPP model. The left figure shows the empirical probability of another event given a 
previous event a given number of days in the past. The decreasing empirical probability with 
time motivates our self-excitation function. The right plot shows the increase in propensity 
for another event given the total cumulative probability from past events. The curvature 
indicates that additional events have diminishing returns on the likelihood of another event, 
motivating the saturation component of the RPP. 


align the sequences of time so that t = 0 represents the time when an event 
happened. We now have a series of “trails” that give the probability of an¬ 
other event, conditional on the last event that occurred for a given manhole. 
We used only trails that were far apart in time so we could look at the effect 
of each event without considering short-term inffuences of other previous 
events. What we see from Figure 3(a) is that the conditional probability 
for experiencing a second event soon after the first event is high and de¬ 
cays with t. This decay represents self-exciting behavior. To see evidence of 
self-excitation from the raw data, we present plots of event times for several 
manholes in Figure 4. We see a clear grouping of events which is consis¬ 
tent with self-exciting behavior. These observations lead us to include the 
g 2 term in (1). The behavior we observe could not be easily explained using 
a simple random effects model; an attempt to do this is within Section 4 of 
the supplementary material [Ertekin, Rudin and McCormick (2015)]. 

Next, we evaluate whether subsequent events continue to increase propen¬ 
sity for another event or whether the risk in the most troubled manholes 
“saturates” and multiple manhole events in a row have diminishing-returns 
on the conditional probabilities. Figure 3(b) shows the saturation effect. The 
^-axis of this plot contains raw empirical probabilities of another event. The 
x-axis are sums of effects from previous recent events (sums of g 2 values). 
If Figure 3(b) were linear, we would not see diminishing returns. That is, 
a linear trend in Figure 3(b) would indicate that each subsequent event in¬ 
creases the likelihood of another event by the same amount. Instead, we see 
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Eig. 4. Time of events in distinet manholes in the Manhattan data that demonstrate the 
self-excitating behavior. The x-axis is the number of days elapsed sinee the day of first 
reeord in the data set and the markers indieate the aetual time of events. 


a distinct curve, indicating that the additional increase in risk decreases as 
the number of events rises. To further aid in developing a functional form 
of the model, we fit smooth curves to the data displayed in Figure 3(a) and 
(b). The process for fitting these smooth curves, as well as simulation exper¬ 
iments for validation, is described in detail in the supplementary material 
[Ertekin, Rudin and McCormick (2015)]. The fitted values for the smooth 
curves are 

. _ 11-62 

51 (t) = 16.98 X (l - log(l + x ^). 

These estimates inspired the parameterizations we provided in equation (2). 
We also estimated the baseline hazard rate Aq and baseline change Ci for 
Manhattan as Aq = 2.4225 x 10“^ and Ci = 0.0512. 

Because the inspection program is relatively new, we were not able to 
trace out the full functions ^4 and ^ 3 ; however, we strongly hypothesize 
that the inspections have an effect that wears off over time based on a 
matched pairs study [see, e.g., Passonneau et al. (2011)], where we showed 
that for manholes that had been inspected at least twice, the second manhole 
inspection does not lead to the same reduction in vulnerability as the first 
manhole inspection does. In what follows, we will show how the parameters 
of 92 ^ 93 and ^4 can be made to specialize to each individual manhole 
adaptively. 

Inspired by the CF estimator, we use the family of functions below for 
fitting power-grid data, where ai, 61 , as, 63 , ^ and 7 are parameters that 
can be either modeled or fitted: 


gi{u) = ai X 



1 

log 2 


log(l + e-^n), 


92{t) = 


1 


1 + ’ 
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53M = «3X + 9S) = Y^f 

The factors of log 2 ensure that the vulnerability level is not negative. 

We need some notation in order to encode the possibility of multiple 
manholes. In the case that there are multiple entities, there are P time- 
series, each corresponding to a unique entity p. For medical applications, 
each p is a patient, and for the electrical grid reliability application, p is a 
manhole. Our data consist of events {t{p)^}p^e: inspections {t{p)^}p^i and, 
additionally, we may have covariate information Mpj about every entity p, 
with covariates indexed by j. Covariates for the medical application might 
include a patient’s gender, age at the initial time, race, etc. For the manhole 
events application, covariates include the number of main phase cables in 
the manhole (number of current carrying cables between two manholes), 
the total number of cable sets (total number of bundles of cables) including 
main, service and streetlight cables, and the age of the oldest cable set within 
the manhole. All covariates were normalized to be between —0.5 and 0.5. 

Within the Bayesian framework, we can naturally incorporate the covari¬ 
ates to model functions Xp for each p adaptively. Consider (3 in the expression 
for the self-excitation function p2 above. The (3 terms depend on individual- 
level covariates. In notation. 

The are assumed to be generated via a hierarchical model of the form 

/3 = log(l -h e“^^) where v ^ A^(0,(j^) 

are the regression coefficients and M is the matrix of observed covariates. 
The y^^^’s are modeled hierarchically in the same manner, 

7 = log(l -h e“^^) with u; ^ A^(0 ,(j5 ). 

This permits slower or faster decay of the self-exciting and self-regulating 
components based on the characteristics of the individual. For the electrical 
reliability application, we have noticed that manholes with more cables and 
older cables tend to have faster decay of the self-exciting terms, for instance. 

Demonstrating the need for the saturation funetion in the RPP model. 
In the previous section we used exploratory tools on the Manhattan data 
to demonstrate diminishing returns in risk for multiple subsequent events. 
In what follows, we link the exploratory work in the last section with our 
modeling framework, demonstrating how the standard linear self-exciting 
process can produce unrealistic results under ordinary conditions. 
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First we show that the self-excitation term can cause the rate of events 
X{t) to increase without bound. To show this, we considered a baseline vul¬ 
nerability of Ao = 0 . 01 , setting Ci = 0 . 1 , used g 2 {t) = ^^^o,oo 5 t ? omitted 
the other components of the model (no inspections, no saturation gi). The 
self-excitation eventually causes the rate of events to escalate unrealistically 
as shown in Figure 5 (upper left). The embedded subfigure is a zoomed-in 
version of the first 1500 time steps. 

When we include the saturation function ^i, the excitation is controlled, 
and the probability of an event no longer increases to unreasonable levels. 
We used gi{(jj) = 1 — j^^log(l -h e“^), so that the vulnerability X{t) can 
reach to a maximum value of 0.021. The result is in Figure 5 (upper right). 

Now we show the effect of the saturation function g^ in the presence 
of repeated inspections. If no manhole events occur and the manhole is 
repeatedly inspected, then using the linear SEPP model, its vulnerability 
levels can become arbitrarily close to 0. This is not difficult to show, and we 
do this in Figure 5 (lower left). Here we used Aq = 0.2, g^it) = 
omitted ^ 3 . We ran the same experiment but with saturation, specifically. 



(a) Model with self-excitation function ^2 
(without saturation function gi and inspections) 
and a zoomed view of the first 1500 days 



(c) Model with self-regulation function 
(without saturation function gs and events) 



0 500 1000 1500 2000 2500 


Time 

(b) Model with self-excitation function g 2 
and saturation function gi (no inspections) 



(d) Model with self-regulation function g^ 
and saturation function gs (no events) 


Eig. 5. The effect of the saturation functions gi and gs. The dots on the time axis in 
subfigures (a) and (b) indicate the times of events, and the dots in subfigures (c) and (d) 
indicate the times of inspections. The figures on the right include saturation, and the figures 
on the left do not include saturation. Without saturation, the self-excitation function in 
(a) grows unbounded, whereas the self-regulation function in (c) drops to an unrealistic 
level of zero. The effects of the saturation functions in Figures (b) and (d) keep g 2 and g^ 
within realistic bounds. 
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with g 3 (co) = 1 — log(l + e^). The results in Figure 5 (lower right) show 
that the saturation function never lets the vulnerability drop unrealistically 
far below the baseline level. 


4. Fitting RPP statistical models. In this section we describe our Bayesian 
framework for inference using RPP models. The RPP intensity in equa¬ 
tion (1) provides structure to capture self-excitation, self-regulation and 
saturation. First, in Section 4.1 we describe the likelihood for the RPP sta¬ 
tistical model. We then describe prior distributions and our computational 
strategy for sampling from the posterior in Section 4.2. Section 4.3 then 
details the values we use in making predictions. Along with the results pre¬ 
sented here, we extensively evaluated our inference strategy using a series 
of simulation experiments, where the goal is to recover parameters of simu¬ 
lated data for which there is ground truth. We further applied the method 
of maximum likelihood to the Manhattan power-grid data. Details of these 
additional experiments are in the supplementary material [Ertekin, Rudin 
and McCormick (2015)]. 


4.1. RPP likelihood. This section describes the likelihood for the RPP 
statistical model. Using the intensity function described in Section 3, the 
RPP likelihood is derived using the likelihood formula for a nonhomogeneous 
Poisson process over the time interval [0,Tniax]: 

log £({4^^,..., }p; i;, ai, M) 

(4) 

P 

p=l 


y^log(Ap(4^4) - f Xp{u)du , 
Le=l 40 J 


where v are coefficients for covariates represented by the matrix M. The 
covariates are the number of main phase cables in the manhole (number of 
current carrying cables between two manholes), the total number of cable 
sets (total number of bundles of cables) including main, service and street¬ 
light cables, and the age of the oldest cable set within the manhole. All 
covariates were normalized to be between —0.5 and 0.5. 


4.2. Bayesian RPP. Developing a Bayesian framework facilitates shar¬ 
ing of information between observably similar manholes, thus making more 
efficient use of available covariate information. The RPP model encodes 
much of our prior information into the shape of the rate function given in 
equation (1). As discussed in Section 3, we opted for a simple, parsimonious 
model that imposes mild regularization and information sharing without 
adding substantial additional information; specifically, we use diffuse Gaus¬ 
sian priors on the log scale for each regression coefficient. 
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We fit the model using Approximate Bayesian Computation [Diggle and 
Gratton (1984)]. The principle of Approximate Bayesian Computation (ABC) 
is to randomly choose proposed parameter values, use those values to gen¬ 
erate data, and then compare the generated data to the observed data. If 
the difference is sufficiently small, then we accept the proposed parameters 
as draws from the approximate posterior. To do ABC, we need two things: 
(i) to be able to simulate from the model and (ii) a summary statistic. To 
compare the generated and observed data, the summary statistic from the 
observed data, 5({4^\ ..., is compared to that of the data simu- 

lated from the proposed parameter values, If 

values are similar, it indicates that the proposed parameter values may yield 
a useful model for the data. 

A critical difference between updating a parameter value in an ABC iter¬ 
ation versus, for example, a Metropolis-Hastings step is that ABC requires 
simulating from the likelihood, whereas Metropolis-Hastings requires evalu¬ 
ating the likelihood. In our context, we are able to both evaluate and simulate 
from the likelihood with approximately the same computational complex¬ 
ity. ABC has some advantages, namely, that we have meaningful summary 
statistics, discussed below. Further, in our case it is not particularly com¬ 
putationally challenging, as we already extensively simulate from the model 
as a means of evaluating hypothetical inspection policies. We evaluated the 
adequacy of this method extensively in simulation studies presented in the 
supplementary material [Ertekin, Rudin and McCormick (2015)]. 

A key conceptual aspect of ABC is that one can choose the summary 
statistic to best match the problem. The sufficient statistic for the RPP is 
the vector of event times, and thus gives no data reduction—so we choose 
other statistics. One important insight in constructing our summary statistic 
is that changing the parameters in the RPP model alters the distribution of 
times between events. The histogram of time differences for a homogenous 
Poisson Process, for example, has an exponential decay. The self-exciting 
process, on the other hand, has a distribution resembling a lognormal be¬ 
cause of the positive association between intensities after an event occurs. 
Altering the parameters of the RPP model changes the intensity of self¬ 
excitation and self-regulation, thus altering the distribution of times between 
events. We construct our first statistic, therefore, by examining the KL di¬ 
vergence between the distribution of times between events in the data and 
the distribution between event times in the simulated data. We do this for 
each of our proposed parameters. Examining the distribution of times be¬ 
tween events, though not the true sufficient statistic, captures a concise and 
low-dimensional summary of a key feature of the process. This statistic does 
not, however, capture the overall prevalence of events in the process. Since 
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we focus only on the distribution of times between events, various processes 
with different overall intensity could produce distributions with similar KL 
divergence to the data distribution. We therefore introduce a second statis¬ 
tic that counts the total number of events. We contend that together these 
statistics represent both the spacing and the overall scale (or frequency) of 
events. Thus, the two summary measures we use are as follows: 

1. DNE: The difference in the number of events in the simulated and 
observed data. 

2. KL: The Kullback-Leibler divergence between two histograms, one 
from the observed data and one from the real data. These are histograms of 
time differences between events. 

For the NYC data, we visualized three-dimensional parameter values, 
both for DNE (in Figure 6 ) and KL (in Figure 7) metrics. In both figures, 
smaller values (dark blue) are better. As seen, the regions where KL and 
DNE are optimized are very similar. 

Denoting the probability distribution of the actual data as P and the 
probability distribution of the simulated data as KL Divergence is com¬ 
puted as 

KL(P||0„) = gl„(|^)F(bi„). 

As mentioned previously in this section, the Bayesian portion of our model 
is relatively parsimonious but does impose mild regularization and encour¬ 
ages stability. We require a distribution tt over parameter values. If covariates 
are not used, tt is a distribution over /3 (and 7 if inspections are present). 
If covariates are used, tt is a distribution over v and u?. One option for tt 
is a uniform distribution across a grid of reasonable values. Another op¬ 
tion, which was used in our experiments, is to simulate from diffuse Gaus¬ 
sian/weakly informative priors on the log scale [e.g., draw log(z/j) iV(0,5)]. 
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Eig. 6. DNE for Manhattan data set. Each axis corresponds to the coefficient for one of 
the covariates. The magnitude of DNE is indicated by the color. 
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Fig. 7. KL for Manhattan data set. Each axis corresponds to the coefficient for one of 
the covariates. The magnitude of KL is indicated by the color. 


We assumed that Ci and ai can be treated as tuning constants to be esti¬ 
mated using the CF estimator method, though it is possible to define priors 
on these quantities as well if desired. 

There is an increasingly large literature in both the theory and imple¬ 
mentation of ABC [see, e.g., Fearnhead and Prangle ( 2012 ), Beaumont et al. 
(2009), Drovandi, Pettitt and Faddy (2011)] that could be used to produce 
estimates of the full posterior. In the supplementary material [Ertekin, Rudin 
and McCormick (2015)], we present an importance sampling algorithm as 
one possible approach. In our work, however, the goal is to estimate the 
posterior mode, which we then use for prediction. To verify our ABC proce¬ 
dure, we used simulated ground truth data with known /3 and 7 values, and 
attempted to recover these values with the ABC method, for both the DNE 
and KL metrics. We performed extensive simulation studies to evaluate this 
method and full results are given in the supplementary material [Ertekin, 
Rudin and McCormick (2015)]. 

In the next section we discuss how we estimate the posterior mode by 
using a manifold approximation to the region of high posterior density. We 
begin by generating a set of proposed parameter values using the prior distri¬ 
butions. Consistent with ABC, we simulate data from each set of candidate 
values and compare the simulated data to our observed data using the KL 
and DNE statistics described above. (From here, we could, e.g., define a ker¬ 
nel and accept draws with a given probability as in importance sampling. 
Instead, our goal is estimating the posterior mode to find parameters for the 
policy decision, as we describe next.) 

4.3. Choosing parameter values for the poliey deeision. For the policy 
simulation in Section 6 we wish to choose one set of parameter values to 
inform our decision. In order to choose a single best value of the parameters, 
we fit a polynomial manifold to the intersection of the bottom 10% of KL 
values and the bottom 10% of DNE values. Defining 'Ui, V 2 and vs as the 
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Eig. 8. Fitted manifold of v values with smallest KL divergenee and smallest DNE. 

coefficients for number of main phase cables, age of oldest main cable set 
and total number of sets features, the formula for the manifold is 

'^3 = —9.6 — 0.98'L’i — 0.13'L’2 — 1.1 x 10 “^('L’i)^ — 3.6 x 10 “^'L’i'L’2 
+ 4.67X 

which is determined by a least squares fit to the data. The fitted manifold 
is shown in Figure 8 along with the data. 

We then optimized for the point on the manifold closest to the origin. This 
implicitly adds regularization, as it chooses the parameter values closest to 
the origin. This point is vi = —4.6554, V 2 = —0.5716, and = —4.8028. 

Note that cable age (corresponding to the second coefficient) is not the 
most important feature defining the manifold. As previous studies have 
shown [Rudin et al. (2010)], even though there are very old cables in the 
city, the age of cables within a manhole is not alone the best predictor of 
vulnerability. Now we also know that it is not the best predictor of the rate 
of decay of vulnerability back to baseline levels. This supports Con Edison’s 
goal to prioritize the most vulnerable components of the power-grid, rather 
than simply replacing the oldest components. The features that mainly de¬ 
termine decay of the self-excitation function g 2 are the number of main phase 
cables and the number of cable sets. As either or both of these numbers in¬ 
crease, decay rate (3 increases, meaning that manholes with more cables tend 
to return to baseline levels faster than manholes with fewer cables. 

5. Predicting events on the NYC power-grid. Our first experiment aims 
to evaluate whether the CF estimator or the feature-based strategy intro¬ 
duced above is better in terms of identifying the most vulnerable manholes. 
To do this, we selected 5000 manholes (rank 1001-6000 from the project’s 
current long-term prediction model). These manholes have similar vulnera¬ 
bility levels, which allows us to isolate the self-exciting effect without mod¬ 
eling the baseline level. Using both the feature-based (3 (ABC, with KL 
metric) and constant (3 (CF estimator method) strategies, the models were 
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Fig. 9. Ranking differences between feature-based and constant (nonfeature-based) f3 
strategies. 

trained on data through 2009, and then we estimated the vulnerabilities of 
the manholes on December 31st, 2009. These vulnerabilities were used as 
the initial vulnerabilities for an evaluation on the 2010 event data. 2010 is 
a relevant year because the first inspection cycle ended in 2009. All man¬ 
holes had been inspected at least once, and many were inspected toward 
the end of 2009, which stabilizes the inspection effects. For each of the 53K 
manholes and at each of the 365 days of 2010, when we observed a serious 
event in a manhole p, we evaluated the rank of that manhole with respect to 
both the feature-based and nonfeature-based models, where rank represents 
the number of manholes that were given higher vulnerabilities than man¬ 
hole p. As our goal is to compare the relative rankings provided by the two 
strategies, we consider only events where the vulnerabilities assigned by both 
strategies are different than the baseline vulnerability. Figure 9 displays the 
ranks of the manholes on the day of their serious event. A smaller rank indi¬ 
cates being higher up the list, thus lower is better. Overall, we find that the 
feature-based /3 strategy performs better than the nonfeature-based strategy 
over all of the rank comparisons in 2010 (p-value 0.09, sign test). Our results 
mainly illustrate that using different decay rates on past events for different 
types of manholes leads to better predictions. Recall from Section 4.3 that 
larger manholes tend to recover faster from previous events. The approach 
without the features ignores the differences between manholes, and uses the 
same decay rate, whereas the feature-based RPP takes these decay rates 
into account in making predictions. 

In the second experiment, we compared the feature-based (3 strategy to 
the Cox proportional-hazard model, which is commonly used in survival 
analysis to assess the probability of failure in mechanical systems. We em¬ 
ployed this model to assess the likelihood of a manhole having a serious event 
on a particular day. For each manhole, we used the same three static co¬ 
variates as in the feature-based (3 model, and developed four time-dependent 
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features. The time-varying features for day t are (1) the number of times the 
manhole was a trouble hole (source of the problem) for a serious event until 

(2) the number of times the manhole was a trouble hole for a serious event 
in the last year, (3) the number of times the manhole was a trouble hole for 
a precursor event (less serious event) until and (4) the number of times 
the manhole was a trouble hole for a precursor event in the last year. The 
feature-based (3 model currently does not differentiate serious and precursor 
events, though it is a direct extension to do this if desired. The model was 
trained using the coxph function in the R survival package using data prior 
to 2009, and then predictions were made on the test set of 5000 manholes 
in the 2010 data set. These predictions were transformed into ranked lists 
of manholes for each day. We then compared the ranks achieved by the Cox 
model with the ranks of manholes at the time of events. The difference of 
aggregate ranks was in favor of the feature-based (3 approach (p-value 7e-06, 
sign test), indicating that the feature-based (3 strategy provides a substantial 
advantage in its ability to prioritize vulnerable manholes. 

The Cox model we compared with represents a “long-term” model simi¬ 
lar to what we were using previously for manhole event prediction on Con 
Edison data [Rudin et al. (2010)]. The Cox model considers different infor¬ 
mation, namely, counts of past events. These counts are time-varying, but 
the past events do not smoothly wear off in time as they do for the RPP. 
The fact that the RPP model is competitive with the Cox model indicates 
that the effects of past manhole events do wear off with time (in agreement 
with Figure 3 where we traced the decay directly using data). The saturating 
elements of the model ensure that the model is physically plausible, since we 
showed in Section 3 that the results could be unphysical (with rates going 
to 0 or above 1) without the saturation. 

6. Making broader policy decisions using RPPs. Because the RPP model 
is a generative model, it can be used to simulate the future, and thus as¬ 
sist with broader policy decisions regarding how often inspections should be 
performed. This can be used to justify allocation of spending. Con Edison’s 
existing inspection policy is a combination of targeted periodic inspections 
and ad hoc inspections. The targeted inspections are planned in advance, 
whereas the ad hoc inspections are unscheduled. An ad hoc inspection could 
be performed while a utility worker is in the process of, for instance, in¬ 
stalling a new service cable to a building or repairing an outage. Either 
source of inspection can result in an urgent repair (Type I), an important 
but not urgent repair (Type II), a suggested structural repair (Types III 
and IV), or no repair, or any combination of repairs. Urgent repairs need 
to be completed before the inspector leaves the manhole, whereas Type IV 
repairs are placed on a waiting list. According to the current inspections pol¬ 
icy, each manhole undergoes a targeted inspection every 5 years. The choice 


REACTIVE POINT PROCESSES 


19 


of inspection policy to simulate can be determined very flexibly, and any 
inspection policy and hypothesized effect of inspections can be examined 
through simulation. 

As a demonstration, we conducted a simulation over a 20 year future 
time horizon that permits a cost-benefit analysis of the inspection program, 
when targeted inspections are performed at a given frequency. To do this 
simulation, we require the following: 

• A characterization of manhole vulnerability. For Manhattan, this is learned 
from the past using the ABC RPP feature-based (3 training strategy for 
the saturation function gi and the self-excitation function g 2 discussed 
above. Saturation and self-regulation functions g^ and g^ for the inspec¬ 
tion program cannot yet be learned due to the newness of the inspection 
program and are discussed below. 

• An inspection policy. The policy can include targeted, ad hoc or history- 
based inspections. We chose to evaluate “bright line” inspection policies, 
where each manhole is inspected once in each Y year period, where Y is 
varied (discussed below). We also included an ad hoc inspection policy 
that visits 3 manholes per day on average. 

Effect of inspections: The effect of inspections on the overall vulnerability 
of manholes were designed in consultation with domain experts. The choices 
are somewhat conservative, so as to give a lower bound for costs. The effect 
of an urgent repair (Type I) is different from the effect of less urgent repairs 
(Types II, III and IV). For all inspection types, after 1 year beyond the 
time of the inspection, the effect of the inspection decays to, on average, 
85% of its initial effect, in agreement with a short-term empirical study on 
inspections. (There is some uncertainty in this initial effect, and the initial 
drop in vulnerability is chosen from a normal distribution so that after one 
year the effect decays to a mean of 85%.) For Type I inspections, the effect 
of the inspection decays to baseline levels after approximately 3000 days, 
and for Types II, III and IV, which are more extensive repairs, the effect 
fully decays after 7000 days. In particular, we use the following ^4 functions: 


1 


(5) \t) = -83.7989 x (r x 5 x 10"^ + 3.5 x 10"^) x 


I _|_ gO.OOlSt ’ 
1 


(6) = -49.014 X (r X 5 X 10“^ + 7 x 10“^) x 


I _j_ g 0 . 00068 t ’ 


where r is randomly sampled from a standard normal distribution. For all 
inspection types, we used the following g^ saturation function: 
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(b) gA for Types II, III, IV 



Eig. 10. Saturation and self-regulation funetions gs and g^ for simulation. 


which ensures that subsequent inspections do not lower the vulnerability to 
more than 60% of the baseline vulnerability. Sampled functions for Type 
I and Types II, II, IV, along with are shown in Figure 10. 

One targeted inspection per manhole was distributed randomly across Y 
years for the bright line V-year inspection policies, and 3 x 365 = 1095 ad 
hoc inspections for each year were uniformly distributed, which corresponds 
to 3 ad hoc inspections per day for the whole power grid on average. During 
the simulation, when we arrived at a time step with an inspection, the 
inspection outcome was Type I with 25% probability, or one of Types II, III 
or IV, with 25% probability. In the rest of the cases (50% probability), the 
inspection was clean, and the manhole’s vulnerability was not affected by 
the inspection. If the inspection resulted in a repair, we sampled r randomly 
and randomly chose the inspection outcome (Type I or Types II, III, IV). 
This percentage breakdown was observed approximately for a recent year of 
inspections in NYC. 

To initialize manhole vulnerabilities for a bright line policy of Y years, we 
simulated the previous Y-year inspection cycle and started the simulation 
with the vulnerabilities obtained at the end of this full cycle. 

Simulation results: We simulated events and inspections for 53.5K man¬ 
holes for bright line policies ranging from Y = 1 year to Y = 20 years. Natu¬ 
rally, a longer inspection cycle corresponds to fewer daily inspections, which 
translates into an increase in overall vulnerabilities and an increase in the 
number of events. This is quantified in Figure 11, which shows the projected 
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Inspection Frequency in Years Inspection Frequency in Years 

Fig. 11. Number of events and inspeetions based on bright line poliey. Number of years 

Y for the bright line poliey is on the horizontal axis in both figures. The left figure shows 
the number of inspeetions, the right figure shows the number of events. 

number of inspections and events for each Y year bright line policy. If we 
change from a 6 year bright line inspection policy to a 4 year policy, we esti¬ 
mate a reduction of approximately 100 events per year. The relative costs of 
inspections and events can thus be considered in order to justify a particular 
choice of Y for the bright line policy. 

Let us say, for instance, that each inspection costs Cj and each event 
costs Ce- The simulation results allow us to denote the forecasted expected 
number of events over a period of time T as a function of the inspection 
frequency T, which we denote by Ne{Y^T). The value of Ne{Y^T) comes 
directly from the simulation, as plotted in Figure 11. Let us make a decision 
for Y for P total manholes, over a period T. To do this, we would choose a 

Y that minimizes the total cost 

Ce X Ne{Y,T)yCi xPx T/Y. 

This line of reasoning provides a quantitative mechanism for decision making 
and can be used to justify a particular choice of inspection policy. 

7. Conclusion. Keeping our electrical infrastructure safe and reliable is 
of critical concern, as power outages affect almost all aspects of our society, 
including hospitals, financial centers, data centers, transportation and su¬ 
permarkets. If we are able to combine historical data with the best available 
statistical tools, it will be possible to impact our ability to maintain an ever 
aging and growing power grid. In this work, we presented a methodology for 
modeling power-grid failures that is based on natural assumptions: (i) that 
power failures have a self-exciting property, which was hypothesized by Con 
Edison engineers, (ii) that the power company’s actions are able to regulate 
vulnerability levels, (hi) that the effects on the vulnerability level of past 
events or repairs can saturate, and (iv) that vulnerability estimates should 
be similar between similar entities. We have been able to show directly (using 
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the CF estimator for the RPP) that the self-exciting and saturation assump¬ 
tions hold. We demonstrated through experiments on past power-grid data 
from NYC, and through simulations, that the RPP model is able to capture 
the relevant dynamics well enough to predict power failures better than the 
current approaches in use. 

The modeling assumptions that underlie RPPs can be directly ported to 
other problems. RPPs are a natural fit for problems in healthcare, where 
medical conditions cause self-excitation and treatments provide regulation. 
Through the Bayesian framework we introduced, RPPs extend to a broad 
range of problems where predictive power can be pooled among multiple 
related entities, whether manholes or medical patients. 

The results presented in this work show for the first time that manhole 
events can be predicted in the short term, which was previously thought not 
to be possible. Knowing how one might do this permits us to take preventive 
action to keep vulnerability levels low, and can help make broader policy 
decisions for power-grid maintenance through simulation of many uncertain 
futures, simulated over any desired policy. 

SUPPLEMENTARY MATERIAL 

Supplementary material for ‘‘Reactive point processes: A new approach 
to predicting power failures in underground electrical systems” 

(DOI: 10.1214/14-AOAS789SUPP; .pdf). The supplementary material in¬ 
cludes an expanded related work section, conditional frequency estimator 
(CF estimator) for the RPP, experiments with a maximum likelihood ap¬ 
proach, a description of the inspection policy used in Section 6, an analysis 
of Manhattan data using random effects model and simulation studies for 
validating the fitting techniques for the models in the paper. It also includes 
a description and link for a publicly available simulated data set that we 
generated, based on statistical properties of the Manhattan data set. 
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